<img src= "http://www.tveskimo.com/wp-content/uploads/2017/11/The-Wire-Back-Burners-Recap-Season-3-Episode-7-2-1.png" height= 50% width = 50% style="float:left" />
Under Title III of The Omnibus Crime Control and Safe Streets Act of 1968 and 18 U.S.C. § 2519, The Administrative Office of the United States Courts (AO) is required to publish annual federal and state records of interception orders for wire, oral, or electronic communications. For each intercept, the AO's annual report includes information on the total cost and duration, the offense under investigation, the jurisdiction, and other data. The full set of variables is listed below.
State
Jurisdiction
AO Number
Judge
Prosecutor
Offense
Intercept Type
Location
Application Date
Original Order (Days)
Number of Extensions
Total Length (Days)
Installation
Number of Days in Operation
Average Intercepts per Day
Number of Persons Intercepted
Number of Incriminating Intercepts
Total Cost in Dollars
Other Than Manpower Cost in Dollars
Arrests
Trials
Motions to Suppress Intercepts
Persons Convicted
Cost Related
Results Related
These data reveal important information about the use of surveillance in criminal investigations. Where are wiretaps used most, and for what types of crimes are they most often deployed? How do wiretap characteristics vary by crime, geography, and time period? What underlying changes in technology, policy, or society might explain these trends? Analyzing wiretap records also allow us to gauge their efficacy of as an investigative tool. How often do wiretaps result in arrests or convictions? Do their benefits outweight their costs? The overarching objective of this report is to answer some of these questions.
The breakup of AT&T in 1982 and Bell Systems later that decade led to a period of renewed competition, growth, and innovation in the telecommunications industry. As the number of telecommunications companies proliferated and new technologies flooded the market, the FBI struggled to keep pace with these new source of complexity. Worried about how this would affect their ability to conduct surveillance, the FBI pursued legislation in Congress that would protect its wiretapping abilities from the effect of potentially disruptive technologies. The FBI's argument in favor of legislative protection rested on a key unstated assumption: that wiretaps are indeed vital tools for law enforcement. Is this assumption indeed valid? In Privacy on the Line, Susan Landau and Whitfield Diffie (2010) attempt to find out by digging into the data on wiretaps. They draw mostly on records from 1988 to 1994, but also include data from 1968 to 2006. Their analysis yields insights into the efficacy of wiretaps as well as other aspects of their usage.
When federal wiretap regulation went into effect in 1968, the majority of wiretaps (64%) were used for investigating gambling cases. Since then, however, the share of wiretaps devoted to narcotics investigations has steadily increased. In 1994, narcotics investigations accounted for 77% of all cases using electronic surveillance. As the use of electronic surveillance increasingly became concentrated on narcotics investigations, the average wiretap became longer and more costly. Between 1968 and 1994, the average cost of a wiretap rose from 1,358 to 49,478 dollars, and the length doubled from 20 to nearly 40 days (p. 209). This shift was largely due to the lengthy timelines involved in drug investigations, which can span months or years.
Wiretaps occasionally result in large drug busts, which are often then used by advocates to defend the practice of wiretapping. It is unlikely that wiretaps actually affect the underlying problem of consumption, however, or that they are a better use of funds than alternative methods of stemming drug use (p. 211). The FBI has likewise cited the importance of wiretaps in investigations of kidnapping and domestic terrorism. Between 1968 and 1994, however, electronic surveillance only played a role in just 2-3 kidnapping cases per year, and domestic terrorism cases are more likely to be investigated under the Foreign Intelligence Surveillance Act (p. 211).
At the state level, 48 jurisdictions have laws that permit that authorize courts to issue orders for oral, wire, or electronic surveillance (AO, 2017). In 2017, the states with the most wiretaps were California, New York, Nevada, and North Carolina. Since the 1990s, California has seen a massive increase in wiretaps: from 8 in 1994 to 225 in 2017 (p. 212; AO, 2017).
Many of the trends first observed by Landau and Diffie have continued apace in the 21st century, but two major changes in wiretapping have occured since the 1990s. First, portable devices became the most commonly wiretapped devices as people switched from landlines to mobile phones. Because portable devices increased the number of daily communications per person, the first change begot a second: the government began intercepting ever-greater volumes of communications. The number of intercepted conversations has increased from around 400,000 in 1968 to over 2 million (pp. 214-215).
Landau and Diffie note several limitations in the Wiretap Report data. First, its statistics do not distinguish wiretaps from bugs. Second, due to reporting issues the data may actually underestimate the total number of intercepts installed. Third, the data lack important contextual information from the court hearings themselves.
Which types of wiretaps are the most expensive? Which are the longest? Which are most effective in terms of arrests and convictions?
Can we come up with a metric for assessing the efficiency of wiretaps? Maybe arrests/dollar spent, arrests/intercept? Can we rank jurisdictions based on these metrics?
Are there any notable differences between wiretaps that were reported vs. not reported?
#import libraries
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
#read federal intercept data
fed17 = pd.read_csv('fedwire_2017.csv')
fed17.head()
fed17.Jurisdiction.unique()
fed17.info()
#summarize data
fed17.describe()
Is the conviction rate worth the cost? We will see mean costs later.
import missingno as msno
fed17.isnull().sum()
#Missing values matrix
msno.matrix(fed17.sample(100))
We only have consistent data for some of the variables. Many observations are missing values from Number of Days in Operation to Results Related. What do these observations have in common?
#How many wiretaps were reported by the prosecutor?
fed17['Installation'].value_counts()
# get rid of 'never installed'
fed17 = fed17[fed17.Installation != 'NEVER INSTALLED']
# very few records report AG or designee, so drop this column
fed17 = fed17.drop('Attorney General or Designee', axis=1)
# New dataframe: include only taps without prosecutor reports
fed_no = fed17.loc[fed17['Installation'] == 'NO PROSECUTOR REPORT']
# check missing values
msno.matrix(fed_no.sample(100))
The missing values matrix above shows missing values for only the wiretaps that had no prosecutor report. The pattern of missing values in our full dataset stems from the fact that data for these variables were not submitted by the prosecutor, as the missing values makes clear.
Among the jurisdictions where prosecutor reports were not submitted, which rank the highest?
fed_no['Jurisdiction'].value_counts()
For each jurisdiction, what percentage of total wiretaps were not accompanied by a prosecutor report?
percent = fed17.groupby(['Jurisdiction'])['Installation'].value_counts(normalize=True, ascending = False) * 100
percent = percent.unstack(level=1)
percent = percent.fillna(0)
percent = percent.round(1)
percent.sort_values(by='NO PROSECUTOR REPORT', ascending=False)
Now we know which jurisdictions are most likely not to submit prosecutor reports. Their low percentage of prosecutor reports might be due to a lack of personnel and resources, or perhaps because the prosecutors in those districts do not regard filing reports as a priority. The AO explains that reports may be missing because "some prosecutors may have delayed filing reports to avoid jeopardizing ongoing investigations. Some of the prosecutors’ reports require additional information to comply with reporting requirements or were received too late to include in this document. Information about these wiretaps should appear in future reports" (AO, 2017).
Regardless of why reports are missing, it should be noted that comparing percentages across jurisdictions may be misleading, because some jurisdictions have dozens of records while others have only several. Northern WVA is on the top of the list for percent of reports not filed, but it only ordered four wiretaps. New Jersey is near the top, but it had 63 wiretaps.
In order to evaluate wiretaps in terms of cost, timespan, arrests, and other factors, we will have to look only at records for which a prosecutor report was filed. Before doing so, however, we will look at the variables that are not dependent on prosecutor reporting to see what insights we might derive.
Jurisdiction
Judge
Location
Type
Offense
Application Date
Original Order (Days)
Number of Extensions
Total Length (Days)
# Number of wiretaps per jurisdiction
fed17['Jurisdiction'].value_counts()
#Number of wiretaps per offense
fed17['Offense'].value_counts()
#Number of wiretaps by type
fed17['Type'].value_counts()
Data Definitions for Wiretap Type:
WC = Cellular or Mobile Telephone (Wire)
WS = Standard Telephone (Wire)
WO = Other (Wire)
OM = Microphone (Oral)
OO = Other (Oral)
AP = App (Electronic)
ED = Digital Pager (Electronic)
EE = Computer or E-Mail (Electronic)
EF = Fax Machine (Electronic)
EO = Other (Electronic)
TX = Text Message (Electronic)
Note: Unlike previous data analyzed by Landau and Duffy, it appears that the AO now distinguishes between wiretaps and bugs.
Almost all wiretap orders are for cell phones and text messages.
fed17['Application Date'].value_counts()
# want to get counts per month for just 2017
# first need to convert column to datetime format
fed17['Application Date'] = pd.to_datetime(fed17['Application Date'])
# extract year and month
fed17['year'] = pd.DatetimeIndex(fed17['Application Date']).year
fed17['month'] = pd.DatetimeIndex(fed17['Application Date']).month
plot = pd.DataFrame(fed17.loc[fed17['year'] == 2017].month.value_counts())
plot.reset_index(level=0, inplace=True)
plot = plot.sort_values(by='index', ascending=True)
plot = plot.rename(columns={"index": "month", "month": "count"})
plot.head()
sns.lineplot(x='month', y = 'count', data=plot)
Wiretap orders peaked in March 2017 and then steadily declined through December.
#New dataframe: keep only records that were reported
fed_rep = fed17.loc[fed17['Installation'] == 'INSTALLED AND USED']
fed_rep.describe()
msno.matrix(fed_rep.sample(100))
Data still patchy in some places, but for the most part we have complete records
#Which criminal investigations involve the longest wiretaps?
fed_rep.groupby('Offense')['Total Length (Days)'].mean()
2017 data doesn't fit earlier assertion that wiretaps for narcotics are longer--look at gambling and corruption
fed_rep['Total Cost($)'] = fed_rep['Total Cost($)'].str.replace(',','')
fed_rep.to_csv('fed_rep.csv')
#load mapping libraries
import geopandas as gpd
import pysal as ps
from pysal.contrib.viz import mapping as maps
import palettable as pltt
from seaborn import palplot
#create link to file directory
jur_dir = 'US_District_Court_Jurisdictions/'
#call file
court_jur = jur_dir + 'US_District_Court_Jurisdictions.shp'
#read shapefile with geopandas
jur = gpd.read_file(court_jur)
jur.set_index('DISTRICT')
jur.plot()
#remove jurisdictions that cause scaling problems when plotting
jur = jur[jur.DISTRICT != 'GUAM']
jur = jur[jur.DISTRICT != 'ALASKA']
jur = jur[jur.DISTRICT != 'NORTHERN MARIANA ISLANDS']
jur = jur[jur.DISTRICT != 'VIRGIN ISLANDS']
jur = jur[jur.DISTRICT != 'HAWAII']
jur = jur[jur.DISTRICT != 'PUERTO RICO']
jur = jur[jur.DISTRICT != 'BERMUDA']
jur.plot()
#check dataframe and projection
jur.crs
#UTM Zone 14
jur.to_crs("+proj=utm +zone=14 +ellps=WGS84 +datum=WGS84 +units=m +no_defs").plot()
#Massachusetts mainland
jur.to_crs({'init': 'epsg:2805'}).plot()
#World Mercator projection
jur.to_crs({'init': 'epsg:3395'}).plot()
jur = jur.to_crs("+proj=utm +zone=14 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
#Set up figure and axis with different size
f, ax = plt.subplots(1, figsize=(15,15))
#Add layer of polygons on the axis
jur.plot(ax=ax)
#Remove axis
ax.set_axis_off()
#Display
plt.show()
fed17.info()
jur.reset_index()
print(jur.DISTRICT.unique())
print(fed17.Jurisdiction.unique())
#I want to merge shapefiles with Federal Wiretap reports on Jurisdiction/District.
#The problem is that there are inconsistencies in the spellings.
#I will modify the shapefile rather than fed17, because it is likely that I will work with
#wiretap files from other years that are formatted the same as fed17. Better to modify the
#shapefile once so that its formatting is consistent with the other files in the future.
#I need to replace each space in the shapefile with a comma.
jur['DISTRICT'] = jur['DISTRICT'].str.replace(' ', ', ')
print(jur.DISTRICT.unique())
#The above command caused problems for two-word state names
#New York
jur['DISTRICT'] = jur['DISTRICT'].str.replace('W, Y', 'W Y')
#New Hampshire
jur['DISTRICT'] = jur['DISTRICT'].str.replace('W, H', 'W H')
#New Jersey
jur['DISTRICT'] = jur['DISTRICT'].str.replace('W, J', 'W J')
#West Virginia
jur['DISTRICT'] = jur['DISTRICT'].str.replace('T, V', 'T V')
#North Carolina, South Carolina
jur['DISTRICT'] = jur['DISTRICT'].str.replace('H, C', 'H C')
#New Mexico
jur['DISTRICT'] = jur['DISTRICT'].str.replace('W, M', 'W M')
#North Dakota, South Dakota
jur['DISTRICT'] = jur['DISTRICT'].str.replace('H, D', 'H D')
#District of Columbia
jur['DISTRICT'] = jur['DISTRICT'].str.replace('T, OF, C', 'T OF C')
#Rhode Island
jur['DISTRICT'] = jur['DISTRICT'].str.replace('E, I', 'E I')
print(jur.DISTRICT.unique())
jur['DISTRICT'] = jur['DISTRICT'].str.replace('T,H', 'TH')
jur['DISTRICT'] = jur['DISTRICT'].str.replace('E,W', 'EW')
jur['DISTRICT'] = jur['DISTRICT'].str.replace('S,T', 'ST')
print(jur.DISTRICT.unique())
#drop records from fed17 for just contiguous US
fed_jur17 = fed17[fed17.Jurisdiction != 'ALASKA']
fed_jur17 = fed_jur17[fed_jur17.Jurisdiction != 'HAWAII']
fed_jur17 = fed_jur17[fed_jur17.Jurisdiction != 'RHODE ISLAND']
fed_jur17 = fed_jur17[fed_jur17.Jurisdiction != 'PUERTO RICO']
fed_jur17.info()
jur_tap = pd.merge(fed_jur17, jur, left_on= 'Jurisdiction', right_on= 'DISTRICT')
jur_tap.head()
jur_tap.info()
jur_tap = jur_tap.drop(columns='DISTRICT')
jur_tap = jur_tap.set_index('Jurisdiction')
jur_tap.head()
fed_count = pd.DataFrame(fed17['Jurisdiction'].value_counts())
fed_count = fed_count.reset_index()
fed_count = fed_count.rename(columns={"index": "Jurisdiction", "Jurisdiction": "Count"})
jur_count = pd.merge(fed_count, jur, left_on= 'Jurisdiction', right_on= 'DISTRICT')
jur_count.set_index('Jurisdiction')
jur_count.info()
jur_count = gpd.GeoDataFrame(jur_count)
jur_count.plot(column='Count')
#Add gradient legend
from matplotlib.colors import Normalize
from matplotlib import cm
#Set up figure and axis with different size
f, ax = plt.subplots(1, figsize=(15,15))
#Remove axis
ax.set_axis_off()
#Set title
ax.set_title('Wiretaps per Federal Jurisdiction, 2017', fontsize=25)
#Add Basemap to axix
jur.plot(ax=ax, color='white', edgecolor='grey')
#Add second layer of polygons on the axis
jur_count.plot(column='Count', ax=ax, legend=False)
#Add legend
mn = jur_count.Count.min()
mx = jur_count.Count.max()
norm = Normalize(vmin=mn, vmax=mx)
n_cmap = cm.ScalarMappable(norm=norm, cmap="viridis")
n_cmap.set_array([])
ax.get_figure().colorbar(n_cmap, ax=ax, orientation='vertical', shrink=.5)
We can see that jurisdictions with the lots of wiretaps are in Southern California, Texas, and Arizona. Chicago also has a high number of taps. We should take a closer look at these jurisdiction. Los Angeles and Texas would be particularly interesting due to their proximity to the border. It seems likely that wiretaps there would be related to transnational drug trafficking investigations.
jur_tap.reset_index()
jt = pd.DataFrame(jur_tap.groupby('Jurisdiction').Offense.value_counts())
jt.head()
jt = jt.rename(columns={'Offense': 'Count'})
jt = jt.reset_index()
jt = pd.merge(jt, jur, left_on= 'Jurisdiction', right_on= 'DISTRICT')
jt.head()
jt = gpd.GeoDataFrame(jt)
#Plot the Count per jurisdiction for a specific Offense
def off_count(df=jt, offense='Offense'):
d = jt.loc[jt['Offense'] == offense]
#Set up figure and axis with different size
f, ax = plt.subplots(1, figsize=(15,15))
#Remove axis
ax.set_axis_off()
#Set title
ax.set_title('2017 Wiretaps for Crime Type: ' + offense, fontsize=25)
#Add baselayer
jur.plot(ax=ax, color='white', edgecolor='grey')
#Add second layer of polygons on the axis
d.plot(column='Count', ax=ax, legend=False)
#Add legend
mn = d.Count.min()
mx = d.Count.max()
norm = Normalize(vmin=mn, vmax=mx)
n_cmap = cm.ScalarMappable(norm=norm, cmap="viridis")
n_cmap.set_array([])
ax.get_figure().colorbar(n_cmap, ax=ax, orientation='vertical', shrink=.5)
#Display
plt.show()
off_count(offense='NARCOTICS')
jt.Offense.unique()
off_count(offense='DRUGS - ILLEGAL')
Note: Should combine narcotics and drugs - illegal for final report
off_count(offense='CORRUPTION')
off_count(offense='CONSPIRACY')
off_count(offense='RACKETEERING')
off_count(offense='IMMIGRATION')
Interesting that Arizona is the only Federal Jurisdiction ordering wiretaps for the "crime" of immigration. This is not surprising given the state's aggressive law enforcement approach to illegal immigration. See pages 40-44 of Karla McKanders (2017) "The Subnational Response: Local Intervention in Immigration Policy and Enforcement" in Steven Bender and William Arrocha (eds.), Compassionate Migration and Regional Policy in the Americas, New York: Palgrave Macmillan.
off_count(offense='TERRORISM')
off_count(offense='SMUGGLING')
See comment above regarding immigration policy in Arizona. Wiretaps for the "crime" of smuggling may be related to US legislative efforts to criminalize immigrants by framing immigration as a form of human smuggling. See p. 10 of Susan Martin (2013), "US Immigration Reform," James A. Baker III Institute for Public Policy.
off_count(offense='LOANSHARKING')
off_count(offense='DRUGS - PRESCRIPTION')
off_count(offense='$LAUNDERING')
off_count(offense='CORRUPTION')
off_count(offense='GAMBLING')
off_count(offense='FRAUD')
off_count(offense='EMBEZZLEMENT')
off_count(offense='KIDNAPPING')
off_count(offense='CONSPIRACY')
off_count(offense='EXTORTION')
off_count(offense='BRIBERY')
off_count(offense='MURDER')
off_count(offense='ROBBERY')
off_count(offense='OTHER')
j = jur_tap.loc[jur_tap['Installation'] == 'INSTALLED AND USED']
j.reset_index(inplace=True)
j.info()
pd.options.mode.chained_assignment = None
j.dropna(subset=['Total Cost($)'], inplace=True)
j['Cost'] = j['Total Cost($)'].str.replace(',','')
j['Cost'] = pd.to_numeric(j['Cost'])
#How many observations have a cost of 0?
j.loc[j['Cost'] == 0].describe()
212 observations list a cost of 0. Why might this be? For some cases, perhaps the wiretaps was never deployed. However, we can see from the table above that others were definitely used (see the max values for 'number of extensions,' 'arrests,' etc). In these cases, perhaps the total cost was not yet recorded because the operation was ongoing. Regardless, we will have to drop observations with costs of 0 because they will skew the mean cost.
#drop values where cost is 0: these were likely not deployed at all
j = j[j['Cost'] != 0]
j.info()
p = pd.DataFrame(j.groupby('Jurisdiction')['Cost'].mean())
p.head()
p = p.reset_index()
p = pd.merge(p, jur, left_on= 'Jurisdiction', right_on= 'DISTRICT')
p.head()
p.set_index('Jurisdiction', inplace=True)
p.head()
p.info()
p = gpd.GeoDataFrame(p)
#Set up figure and axis with different size
f, ax = plt.subplots(1, figsize=(15,15))
#Remove axis
ax.set_axis_off()
#Set title
ax.set_title('Mean Cost of Wiretaps per Federal Jurisdiction, 2017', fontsize=25)
#Add baselayer
jur.plot(ax=ax, color='white', edgecolor='grey')
#Add second layer of polygons on the axis
p.plot(column='Cost', ax=ax, legend=False)
#Add legend
mn = p.Cost.min()
mx = p.Cost.max()
norm = Normalize(vmin=mn, vmax=mx)
n_cmap = cm.ScalarMappable(norm=norm, cmap="viridis")
n_cmap.set_array([])
ax.get_figure().colorbar(n_cmap, ax=ax, orientation='vertical', shrink=.5)
#Display
plt.show()
b. Create time series map for that shows change over time per district for at least one variable.
b. Take a closer look at particular jurisdictions such as those in Southern California and Texas.
c. It could also be interesting to single out certain outliers that incurred extremely high costs or led to lots of arrests.
d. Who is getting a "return on their investment?" Who is "striking out?"